Streaming methods and systems

ABSTRACT

Various embodiments provide methods and systems for streaming data that can facilitate streaming during bandwidth fluctuations in a manner that can enhance the user experience. In one aspect, a forward-shifting technique is utilized to buffer data that is to be streamed, e.g. an enhancement layer in a FGS stream. Various techniques can drop layers actively when bandwidth is constant. The saved bandwidth can then be used to pre-stream enhancement layer portions. In another aspect, a content-aware decision can be made as to how to drop enhancement layers when bandwidth decreases. During periods of decreasing bandwidth, if a video segment does not contain important content, the enhancement layers will be dropped to keep the forward-shifting of the enhancement layer unchanged. If the enhancement layer does contain important content, it will be transmitted later when bandwidth increases.

TECHNICAL FIELD

[0001] This invention relates to data streaming methods and systems and,in particular, to scalable video streaming methods and systems.

BACKGROUND

[0002] Many scalable video-coding approaches have been proposed over thepast few years for real-time Internet applications. In addition, severalvideo scalability approaches have been adopted by video compressionstandards such as MPEG-2, MPEG-4, and H.26x. Temporal, spatial, andquality (SNR) scalability types have been defined in these standards.

[0003] All of these types of scalable video consist of a so-called baselayer and one or multiple enhancement layers. The base layer part of thescalable video stream represents, in general, the minimum amount of dataneeded for decoding that stream. The enhancement layer part of thestream represents additional information, and therefore it enhances thevideo signal representation when decoded by a receiver. For each type ofvideo scalability, a certain scalability structure is used. Thescalability structure defines the relationship among the pictures of thebase-layer and the pictures of the enhancement layer.

[0004] Another type of scalability, which has been primarily used forcoding still images, is fine granular scalability (FGS). Images codedwith this type of scalability can be decoded progressively. In otherwords, the decoder can start decoding and displaying the image afterreceiving a very small amount of data. As more data is received, thequality of the decoded image is progressively enhanced until thecomplete information is received, decoded, and displayed.

[0005] The FGS encoding framework provides a good balance betweencoding-efficiency and a very simple scalability structure. As shown inFIG. 1, the FGS structure consists of two layers: a base-layer coded ata bitrate R_(BL) and an enhancement-layer coded using a fine-granular(or embedded) scheme to a maximum bitrate of R_(max). FIG. 1 showsexamples of the FGS scalability structure at the encoder (left),streaming server (center), and decoder (right) for a typical unicastInternet streaming application. The top and bottom rows of the figurerepresent base-layers without and with Bi-directional (B) frames,respectively.

[0006] This structure provides a very efficient, yet simple, level ofabstraction between the encoding and streaming processes. The encoderonly needs to encode the video as a base layer and an enhancement layer,and it does not need to be aware of the particular bitrate at which thecontent will be streamed. The streaming server, on the other hand, has atotal flexibility in sending any desired portion of any enhancementlayer frame (in parallel with the corresponding base layer picture),without the need for performing complicated real-time transcodingalgorithms. This enables the server to handle a very large number ofunicast streaming sessions, and to adapt to their bandwidth variationsin real-time. On the receiver side, the FGS framework adds a smallamount of complexity and memory requirements to any standardmotion-compensation based video decoder. These advantages of the FGSframework are achieved while maintaining rather surprisingly goodcoding-efficiency results.

[0007] One of the problems that continue to present itself in thecontext of streaming application is that of limited and/or fluctuatingbandwidth. That is, as congested networks such as the Internet continueto find wide and varied use, bandwidth can become limited and canfluctuate during periods of higher and lower usage. The sending rate ofvideo stream has to be adjusted accordingly. As a result, “jittering”can be very annoying to most video viewers. Accordingly, streamingvendors such as video vendors endeavor to provide constant or smoothquality. Due to the burstiness of video streams and bandwidthfluctuations of the transmission media, achieving this goal can be verychallenging.

[0008] FGS coding provides the possibility of adapting a video streamtaking into account the available bandwidth. However, FGS coding schemeitself does not provide for any smoothing techniques when bandwidthdecreases sharply. This becomes especially important when considerationis given to the enhancement layers that contain important content in FGSsteaming applications. One straightforward utilization of FGS coding inthe context of limited bandwidth situations can involve simply droppingportions of the enhancement layer when bandwidth becomes limited inorder to decrease the bit rate of the data stream. But, when bandwidthsharply decreases, as is often the case, too much of the enhancementlayer can be dropped and there is no way to guarantee the quality of thevideo. That is, as bandwidth decreases when important content is beingstreamed, there is no protection scheme to prevent the important contentfrom being dropped.

[0009] Accordingly, this invention arose out of concerns associated withproviding improved streaming methods and systems. In particular, thisinvention arose out of concerns associated with providing methods andsystems for scalable streaming.

SUMMARY

[0010] Various embodiments provide methods and systems for streamingdata that can facilitate streaming during bandwidth fluctuations in amanner that can enhance the user experience. In one aspect, aforward-shifting technique is utilized to buffer data that is to bestreamed, e.g. an enhancement layer in a FGS stream. To forward-shiftthe video stream, unimportant enhancement layers are actively droppedwhen bandwidth is constant. The saved bandwidth can then be used topre-stream following stream portions. It is to be appreciated andunderstood that the base layer is guaranteed to be delivered asbandwidth is sufficient to transmit the very low bit rate base layer. Inanother aspect, a content-aware decision can be made as to how to dropenhancement layers when bandwidth decreases. During periods ofdecreasing bandwidth, if a video segment does not contain importantcontent, the enhancement layers will be dropped to keep theforward-shifting of the enhancement layer unchanged. If the enhancementlayer does contain important content, it will be transmitted later whenbandwidth increases.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011]FIG. 1 is a diagram that describes one specific FGS videostreaming example, and is useful for understanding one or more of theinventive embodiments described herein.

[0012]FIG. 2 is a block diagram of an exemplary computer system that canbe utilized to implement one or more embodiments.

[0013]FIGS. 3a-c are diagrams that can assist in understanding aspectsof a forward-shifting technique in accordance with one embodiment.

[0014]FIGS. 4a-g are diagrams that can assist in understanding aspectsof a forward-shifting technique in accordance with one embodiment.

[0015]FIG. 5 is a flow diagram that describes steps in a method inaccordance with one embodiment.

[0016] FIGS. 6-8 are diagrams that can assist in understandingprocessing steps in accordance with one embodiment.

[0017] FIGS. 9-11 are diagrams that illustrate aspects of two models inaccordance with one embodiment.

[0018]FIG. 12 is a diagram that illustrates an exemplary window inaccordance with one embodiment.

[0019]FIG. 13 is a block diagram of an exemplary system in accordancewith one embodiment.

[0020]FIG. 14 is a block diagram of an exemplary rate controller inaccordance with one embodiment.

[0021]FIG. 15 is a state diagram in accordance with one embodiment.

[0022]FIGS. 16 and 17 are diagrams of exemplary bandwidth curves.

[0023]FIGS. 18 and 19 are diagrams that show behaviors of an exemplarystate machine in accordance with one bandwidth curve.

[0024]FIGS. 20 and 21 are diagrams that show behaviors of an exemplarystate machine in accordance with another bandwidth curve.

DETAILED DESCRIPTION

[0025] Overview

[0026] The embodiments described below provide methods and systems forstreaming data that can facilitate streaming during bandwidthfluctuations in a manner that can enhance the user experience. In oneparticular embodiment described below, various inventive techniques aredescribed in the context of FGS coding. It is to be appreciated,however, that the FGS environment constitutes but one exemplaryimplementation and such is not intended to limit application of theclaimed subject matter only to FGS systems, except when specificallyclaimed.

[0027] In various embodiments, the shifting or processing of the layerscan take place continuously thus ensuring, to a great degree, that whenbandwidth fluctuates, the client does not undesirably experience adegradation in streaming quality. The amount of forward-shifting canadvantageously be determined by the size of the client side buffer intowhich the content is forward-shifted. The size of the client side buffercan be selected and designed to accommodate the duration of the drop innetwork bandwidth. Optimal or desirable buffer sizes can bestatistically determined taking into account bandwidth reductionduration times. For example, network bandwidth typically fluctuates fora determinable amount of time (e.g. 0.5-10 seconds). This fluctuationduration can be used as a guide for size of the client buffer.

[0028] One of the advantages of continuously attempting to forward shiftthe layers is that layers can be built up on the client side so that ifthere is a fluctuation in network bandwidth, the quality on the clientside is not seriously degraded. As discussed below, the importance ofthe content can be ascertained. This can facilitate the forward-shiftingprocess by enabling content that is determined to be important to bemore actively forward-shifted when network bandwidth is available.Similarly, when network bandwidth drops, the forward-shifting of thelayers can be adjusted to ensure that content that is determined to beunimportant is not meaningfully forward shifted. Thus, variousembodiments can provide techniques that determine the importance of thecontent and then make decisions to forward shift layers in accordancewith fluctuations in network bandwidth and the content's importance.When content is important and network bandwidth is available, then thelayers can be more actively forward shifted. When the content isunimportant and network bandwidth drops, the forward-shifting can beless active or inactive.

[0029] In one aspect, a forward-shifting technique is utilized to bufferthe enhancement layer. Instead of dropping layers passively whenbandwidth decreases, unimportant enhancement layers are actively droppedwhen bandwidth is constant. The saved bandwidth can then be used topre-stream enhancement layer portions. As a result, the wholeenhancement layer can be shifted forward by a certain amount of bits.This provides chances for content-aware decisions when bandwidthdecreases. The forward-shifting technique is, in some respects, like abridge between FGS coding and content analysis. It is to be appreciatedand understood that the base layer is guaranteed to be delivered asbandwidth is sufficient to transmit the very low bit rate base layer.

[0030] In another aspect, a content-aware decision can be made as to howto drop enhancement layers when bandwidth decreases. During periods ofdecreasing bandwidth, if the video segment does not contain importantcontent, the enhancement layers will be dropped to keep theforward-shifting of the enhancement layer unchanged. If the enhancementlayer does contain important content, it will be transmitted later whenbandwidth increases. The forward-shifted bits can help to guarantee thatthe clients will not suffer from buffer underflow. In this way,important content can be protected. If the bandwidth is constant, morebits of important layers are transmitted and some high layers ofunimportant layers are dropped to make room for important layers.Content can be analyzed online or offline as various embodiments are inthe scenario of streaming stored video.

[0031] Exemplary Computer Environment

[0032]FIG. 2 illustrates an example of a suitable computing environment200 on which the system and related methods for processing media contentmay be implemented.

[0033] It is to be appreciated that computing environment 200 is onlyone example of a suitable computing environment and is not intended tosuggest any limitation as to the scope of use or functionality of themedia processing system. Neither should the computing environment 200 beinterpreted as having any dependency or requirement relating to any oneor combination of components illustrated in the exemplary computingenvironment 200.

[0034] The media processing system is operational with numerous othergeneral purpose or special purpose computing system environments orconfigurations. Examples of well known computing systems, environments,and/or configurations that may be suitable for use with the mediaprocessing system include, but are not limited to, personal computers,server computers, thin clients, thick clients, hand-held or laptopdevices, multiprocessor systems, microprocessor-based systems, set topboxes, programmable consumer electronics, network PCs, minicomputers,mainframe computers, distributed computing environments that include anyof the above systems or devices, and the like.

[0035] In certain implementations, the system and related methods maywell be described in the general context of computer-executableinstructions, such as program modules, being executed by a computer.Generally, program modules include routines, programs, objects,components, data structures, etc. that perform particular tasks orimplement particular abstract data types. The media processing systemmay also be practiced in distributed computing environments where tasksare performed by remote processing devices that are linked through acommunications network. In a distributed computing environment, programmodules may be located in both local and remote computer storage mediaincluding memory storage devices.

[0036] In accordance with the illustrated example embodiment of FIG. 2computing system 200 is shown comprising one or more processors orprocessing units 202, a system memory 204, and a bus 206 that couplesvarious system components including the system memory 204 to theprocessor 202.

[0037] Bus 206 is intended to represent one or more of any of severaltypes of bus structures, including a memory bus or memory controller, aperipheral bus, an accelerated graphics port, and a processor or localbus using any of a variety of bus architectures. By way of example, andnot limitation, such architectures include Industry StandardArchitecture (ISA) bus, Micro Channel Architecture (MCA) bus, EnhancedISA (EISA) bus, Video Electronics Standards Association (VESA) localbus, and Peripheral Component Interconnects (PCI) bus also known asMezzanine bus.

[0038] Computer 200 typically includes a variety of computer readablemedia. Such media may be any available media that is locally and/orremotely accessible by computer 200, and it includes both volatile andnon-volatile media, removable and non-removable media.

[0039] In FIG. 2, the system memory 204 includes computer readable mediain the form of volatile, such as random access memory (RAM) 210, and/ornon-volatile memory, such as read only memory (ROM) 208. A basicinput/output system (BIOS) 212, containing the basic routines that helpto transfer information between elements within computer 200, such asduring start-up, is stored in ROM 208. RAM 210 typically contains dataand/or program modules that are immediately accessible to and/orpresently be operated on by processing unit(s) 202.

[0040] Computer 200 may further include other removable/non-removable,volatile/non-volatile computer storage media. By way of example only,FIG. 2 illustrates a hard disk drive 228 for reading from and writing toa non-removable, non-volatile magnetic media (not shown and typicallycalled a “hard drive”), a magnetic disk drive 230 for reading from andwriting to a removable, non-volatile magnetic disk 232 (e.g., a “floppydisk”), and an optical disk drive 234 for reading from or writing to aremovable, non-volatile optical disk 236 such as a CD-ROM, DVD-ROM orother optical media. The hard disk drive 228, magnetic disk drive 230,and optical disk drive 234 are each connected to bus 206 by one or moreinterfaces 226.

[0041] The drives and their associated computer-readable media providenonvolatile storage of computer readable instructions, data structures,program modules, and other data for computer 200. Although the exemplaryenvironment described herein employs a hard disk 228, a removablemagnetic disk 232 and a removable optical disk 236, it should beappreciated by those skilled in the art that other types of computerreadable media which can store data that is accessible by a computer,such as magnetic cassettes, flash memory cards, digital video disks,random access memories (RAMs), read only memories (ROM), and the like,may also be used in the exemplary operating environment.

[0042] A number of program modules may be stored on the hard disk 228,magnetic disk 232, optical disk 236, ROM 208, or RAM 210, including, byway of example, and not limitation, an operating system 214, one or moreapplication programs 216 (e.g., multimedia application program 224),other program modules 218, and program data 220. A user may entercommands and information into computer 200 through input devices such askeyboard 238 and pointing device 240 (such as a “mouse”). Other inputdevices may include a audio/video input device(s) 253, a microphone,joystick, game pad, satellite dish, serial port, scanner, or the like(not shown). These and other input devices are connected to theprocessing unit(s) 202 through input interface(s) 242 that is coupled tobus 206, but may be connected by other interface and bus structures,such as a parallel port, game port, or a universal serial bus (USB).

[0043] A monitor 256 or other type of display device is also connectedto bus 206 via an interface, such as a video adapter 244. In addition tothe monitor, personal computers typically include other peripheraloutput devices (not shown), such as speakers and printers, which may beconnected through output peripheral interface 246.

[0044] Computer 200 may operate in a networked environment using logicalconnections to one or more remote computers, such as a remote computer250. Remote computer 250 may include many or all of the elements andfeatures described herein relative to computer.

[0045] As shown in FIG. 2. computing system 200 is communicativelycoupled to remote devices (e.g., remote computer 250) through a localarea network (LAN) 251 and a general wide area network (WAN) 252. Suchnetworking environments are commonplace in offices, enterprise-widecomputer networks, intranets, and the Internet.

[0046] When used in a LAN networking environment, the computer 200 isconnected to LAN 251 through a suitable network interface or adapter248. When used in a WAN networking environment, the computer 200typically includes a modem 254 or other means for establishingcommunications over the WAN 252. The modem 254, which may be internal orexternal, may be connected to the system bus 206 via the user inputinterface 242, or other appropriate mechanism.

[0047] In a networked environment, program modules depicted relative tothe personal computer 200, or portions thereof, may be stored in aremote memory storage device. By way of example, and not limitation,FIG. 2 illustrates remote application programs 216 as residing on amemory device of remote computer 250. It will be appreciated that thenetwork connections shown and described are exemplary and other means ofestablishing a communications link between the computers may be used.

[0048] Exemplary First Embodiment

[0049] In the embodiment about to be described, the inventiveforward-shifting process and content-aware decision is explained in thecontext of streaming an FGS stream. It is to be appreciated andunderstood that the inventive forward-shifting process and content-awaredecision can be performed in connection with streaming non-FGS streams,e.g. MPEG-1 video stream, and that the FGS-specific description isgiven, for among other reasons, to illustrate to the reader one specificcontext in which the inventive process can be employed. Theforward-shifting process can also be used for streaming spatially,temporally, and SNR scalable video defined in MPEG-2, MPEG-4, and H.26L.

[0050] Server Initialization

[0051] Before streaming of the FGS stream, an initialization process isperformed so that the streaming server can begin streaming an initiallyforward-shifted enhancement layer, and thus forward-shift the wholevideo stream. For the sake of clarity, it is assumed that availablebandwidth is constant during the initialization process. This isdiagrammatically shown in FIG. 3a, where R₁ represents the bit rate ofthe base layer, and R₂-R₁ represents the bit rate of the enhancementlayer. Additionally, both base layer and enhancement layer are assumedto be CBR (Constant Bit Rate) streams. Representative base andenhancement layers are indicated in FIG. 3b. Further, it is to beappreciated that in the illustrated FGS stream, multiple enhancementlayers are supported. In this particular enhancement layer embodiment,lower layers of the enhancement layer are deemed more important thanhigher layers of the enhancement layer. That is, the enhancement layeris encoded using bit-plane DCT. Accordingly, low layers represent moreimportant bit-plane and thus are more important.

[0052] In the illustrated example, the bits stream of the enhancementlayer is divided into individual blocks. Each block has the same amountof bits. The time scale and exact number of bits of each block istypically specified in particular applications.

[0053] At the beginning of the FGS video streaming, a first block isdropped to enable the enhancement layer to be forward shifted. Thus thewhole video stream is forward shifted as the base layer is guaranteed tobe delivered. In this particular example, block 1 is always dropped. Thesaved bandwidth is then used to transmit block 2. This isdiagrammatically shown in FIG. 3c where block 2 is depicted in the spacethat was previously occupied by block 1. Accordingly, after thetransmission of block 0 and block 2, block 3 will be transmitted fromtime t₂. As a result, the FGS enhancement layer is shifted forward by ablock size from t₂ on. This is apparent when a comparison is madebetween FIGS. 3b and 3 c. Notice that in FIG. 3b, block 3 is transmittedat time t₃. In FIG. 3c, however, block 3 is transmitted at time t₂.Since the dropped bits of block 1 are in the high layers of theenhancement layer, the video quality will not be severely degraded.

[0054] At this point in the process, the server initialization processis complete ii and has produced a forward-shifted enhancement layer.This forward-shifted enhancement layer provides a buffer for bandwidthdecreases that might be experienced in the future.

[0055] Content-Aware Layer Dropping

[0056] In the discussion that follows, a content-aware decision-makingprocess is described that enables intelligent decisions to be made as tohow to drop enhancement layer portions when bandwidth decreases occur.Essentially, during periods of decreased bandwidth, if the video segmentdoes not contain important content, portions of the enhancement layerscan be dropped to keep the forward-shifting of the enhancement layerunchanged. If the video segment does contain important content, then theenhancement layer will be transmitted later, when bandwidth increases.The forward-shifted bits can, in many instances, guarantee that theclients will not suffer from buffer underflow. In this way, importantcontent is protected and the overall video quality is smoothed. If thebandwidth is constant, more bits of important layers are transmitted andsome high layers of unimportant layers are dropped to make room forimportant layers.

[0057] As an example, consider the following. At best, networkbandwidth, such as the bandwidth of the Internet, varies commonly andrandomly. When bandwidth decreases, the FGS video server inevitably hasto drop some layers. The described forward-shifting technique makes itpossible to selectively drop layers so that layers with importantcontent can be protected, and more unimportant layers can be dropped.That is, when bandwidth decreases, the enhancement layer will betransmitted later if it contains important content. Otherwise, if theenhancement layer does not contain important content, some of its layers(or portions thereof) can be dropped.

[0058] To illustrate this method more clearly, consider the examplepresented in FIGS. 4a-g.

[0059]FIG. 4a shows an original enhancement layer, which is divided intoblocks of equal size. The illustrated enhancement layer comprises blocksn−2 through n+4. FIG. 4b shows the forward-shifted stream after theinitialization process described above.

[0060] Now, assume, as shown in FIG. 4c, that the bandwidth decreases toR₃ from t−1 to t. As a result, the content importance of block n is nowanalyzed. It will be appreciated that content analysis can be performedonline or offline, as mentioned above. It just so happens that, in thisexample, content analysis is performed online. Content analysis can beperformed using any suitable content analysis algorithms or tools.Exemplary tools will be understood and apparent to those of skill in theart. As but a few examples, content analysis can take place using suchthings as perception model based or structure model based contentanalysis. One specific example of how this can be done is given below.

[0061] Content analysis enables a determination to be made as to whetheror not selected layers of block n are to be dropped. In this example,the content analysis enables a determination to be made as to whether todrop the high layers (i.e. less important layers) of the particularblock. If the content of block n is determined not to be important, thenthe selected high layers of block n are dropped. This isdiagrammatically illustrated in FIG. 4d.

[0062] Thus, the integrity of the forward-shifted enhancement layer ispreserved. That is, the forward-shifted enhancement layer remainsforward shifted without any changes. If, on the other hand, the contentof block n is determined to be important, then following processing cantake place.

[0063] First, as shown in FIG. 4e, transmission of the high layers ofblock n is delayed somewhat. These layers will be transmitted later whenbandwidth increases again. Accordingly, from time t−1 to t, only part ofblock n is transmitted. It should be pointed out that this will notresult in buffer underflow at the client side because sufficient bitshave been forward-shifted.

[0064] When bandwidth increases at time t, the delayed layers of block ncan now be transmitted. The result of this delayed transmission is thatthe whole enhancement layer is now shifted by half a block. After blockn is completely transmitted (i.e. the high layers), the process againseeks to actively drop some unimportant high layers to ensure that thewhole enhancement layer is shifted by a block size to prepare for nextbandwidth decrease.

[0065] This can be accomplished as follows. If any block after block nis determined to be unimportant (as by suitable content analysis), itshigh layers will be actively dropped. For example, in FIG. 4f, assumethat block n+1 is determined to be unimportant. Accordingly, high layersof block n+1 are dropped. This results in the enhancement layer beingagain forward-shifted by a block after t+1. This is diagrammaticallyshown in FIG. 4g. There, by comparison with FIG. 4f, block n+2 is to betransmitted at time t+1, rather than some time later as indicated inFIG. 4f. At this point, the FGS video server is ready for the nextbandwidth decrease.

[0066] It is to be appreciated and understood that the base layer isguaranteed to be delivered and the receiver (on the client end) cansynchronize the transmitted enhancement layer and base layer. Thisforward-shifting technique does not require extra client buffering orextra delay.

[0067] It should be pointed out that this embodiment permits the blocksizes to be different, and that the time scales of bandwidth decreasecan vary. This flexibility comes both from flexible bit-droppingpermitted by FGS, and from the shifting mechanism of the describedembodiment. An accurate network bandwidth model can further be helpful,but the described embodiment does not depend on a network bandwidthmodel. If the bandwidth decreases severely and pre-streamed bits areinsufficient, dropping low layers of the enhancement layer may becomeinevitable.

[0068] Thus, in this embodiment, based on FGS coding, two advantageouscontributions are made to solve problems associated with decreasingbandwidth. First, a forward-shifting technique facilitates buffering theenhancement layer and thus the whole stream. Instead of dropping layerspassively when bandwidth decreases, the described embodiment can droplayers actively when bandwidth is constant. The saved bandwidth is thenused to pre-stream later portions of the enhancement layer. As a result,the whole enhancement layer is shifted forward by a certain amount ofbits. Second, content-aware decisions can be made as to how to dropenhancement layers when bandwidth decreases. Because the describedembodiment is not dependent on any one particular method of analyzingcontent, flexible solutions can be provided for accommodating variousdifferent ways of analyzing content.

[0069]FIG. 5 is a flow diagram that describes steps in a method inaccordance with one embodiment. The steps can be implemented in anysuitable hardware, software, firmware, or combination thereof. In theillustrated example, the steps can be implemented by a suitablyprogrammed streaming server.

[0070] Step 500 drops at least one enhancement layer block and step 502shifts following enhancement layer blocks forward. These two stepsdesirably initialize the streaming server so as to forward shift theenhancement layer in anticipation of a bandwidth decrease. The baselayer is also transmitted at the initialization step because enhancementlayer can not be decoded without base layer. The server can now startstreaming following base and enhancement layers. One specific example ofhow this can be done is given above.

[0071] Step 504 determines whether there has been a decrease in theavailable bandwidth. If there is no bandwidth decrease, then step 506continues to transmit base and enhancement layers. If, on the otherhand, there is a bandwidth decrease, then step 508 analyzes the currentenhancement layer block. Analysis of the current block can take place byanalyzing the content of the current video segment to which the blockbelongs. Various content analysis techniques can be used, with but a fewexamples being given above. In addition, step 508 can be performed online (i.e. during streaming) or offline (prior to streaming). If, atstep 510, the current enhancement layer block is determined to not beimportant, then step 512 drops one or more high layers of the block andcontinues transmission. This step is directed to preserving theforward-shifted enhancement layer. If, on the other hand, step 510determines that the content of the current enhancement layer block isimportant, then step 514 delays transmission of current block portions.Step 516 determines whether bandwidth has increased. If not, the stepreturns to step 514 and continues monitoring bandwidth for an increase.If the bandwidth has increased, then step 518 transmits thepreviously-delayed block portions. Step 520 then forward shifts theenhancement layer again by actively dropping some unimportant highlayers. The method is now ready for another bandwidth decrease and canreturn to step 504.

[0072] The embodiment just described provides a new content-aware videostreaming method that can advantageously enhance the user experience. Inbut one implementation, the method can be employed to stream FGSstreams. The method can go a long way toward ensuring a desirable levelof video quality when network bandwidth decreases sharply.

[0073] Perceptual Temporal Video Adaptation Techniques

[0074] Dropping B frames (i.e. bi-directionally predictive-coded frames)is one of the major techniques for rate adaptation to bandwidthdecreases in video streaming applications. Yet, how to maximize usersatisfaction when dropping B frames continues to be an open area forexploration and improvement. Dropping frames will cause motion juddersince the dropped frames usually are replaced by replaying previousframes. The reason that one sees judder after frame dropping is thathuman eye and brain are trying to track the smooth motion of movingobjects. That is, when a frame is repeated, one's brain will usually beconfused and does a double take. This is the perceived motion judder andis very annoying to viewers. In fact, a viewer's perceived motion judderand his/her satisfaction to frame dropping heavily depends on the motionin video sequence. That is, frame dropping with different motionpatterns will result in different levels of user satisfaction.

[0075] From the sampling theorem point of view, dropping frames meansdecreasing temporal sampling rate that possibly results in temporalaliasing. Hence, dropping frames of lower temporal frequency ispreferable to dropping frames of higher temporal frequency. In most ofvideo systems today, dropping frames means repeating previous frames,which will result in motion judder. It is found that frame repetitiongives good results where no motion is present, but fails in movingareas, resulting in clearly visible motion judder in frame rateup-conversion applications. Actually, dropping frames with camera panmotion is more annoying than dropping frames with other kinds of motionas motion judder is most noticeable on camera pans in video. Therefore,to model user satisfaction to frame dropping, the low-level motiondescription feature should embody such characteristics of humanperception. The closer to human perception characteristics a motiondescription is, the more accurate the learned model will be. In thisinvention various described embodiments, a low-level feature named PME(Perceived Motion Energy) is introduced to describe motion intensity invideo sequences.

[0076] In the embodiments described below, user satisfaction to B framedropping is modeled using two inventive MMS-PC (Mean MOS Score-PMEClass) models to predict user satisfaction by low-level motion features.In the models, video motion is described in a way that the motionfeature is highly correlated with user satisfaction to frame ratedecrease. Video sequences are separated into successive segments usingthe motion feature, and video segments are classified by the motionfeature. Learning from large MOS (Mean Opinion Score) test results,video segments are classified into several classes, and two models thatmap these segment classes to user satisfaction to two typical frame ratedecreases are obtained. High correlation between prediction by themodels and real MOS test results enable construction of a priority-basedmodel to describe which frames are important to human perception andwhich frames are not. As a result, the video adaptation scheme can bebased on the priority-based delivery model to protect frames that aremore important to viewers' perception.

[0077] The forward-shifting technique and content-aware decision-making,which are described in the FGS video streaming environment, are utilizedin the perceptual temporal video adaptation scheme. A state machine isprovided to implement the adaptation scheme. The state of the statemachine transits to absorb short-term bandwidth or bit-rate variations,and the mode of the state machine transits to adapt to long-termbandwidth or bit-rate variations. The state of the state machine isdetermined by client buffer occupancy, available bandwidth, and thepriorities and sizes of current frames. The mode of the state machine isdecided by the bit rate of the stream and average bandwidth.

[0078] MMS-PC Models

[0079] Because of the frame interdependency in MPEG, dropping P framesor dropping I frames will result in very annoying motion judder evenwhen the motion is slow. Accordingly, the described embodiments focus onB frame dropping. Two typical dropping percentages, that is, 50% and100%, are used to obtain two degraded frame rates of original videosequences.

[0080] To human perception, dropping frames of low motion intensity isless perceptible than dropping frames of high motion intensity. Droppingframes with camera pan motion is more annoying than dropping frames withother kinds of motion. Accordingly, low-level feature for motiondescription should embody such characteristics of human perception tomotion. The closer to human perception characteristics motiondescription is, the more accurate the learned model will be. In theembodiment described below, a low-level feature designated “PME” for“perceived motion energy” is introduced, and a model is developed toseparate video sequences into segments according to PME values. In thedescribed embodiment, the developed model is a triangle model, as willbecome apparent below.

[0081] Video viewers are the ultimate judges of video quality. Giventhis, the described models learn from video viewers by learning theresults of a large number of MOS tests. In the described embodiment, theDCR (Degradation Category Rating—ITU-T Recommendation P.910) MOS testscheme is utilized. The whole MOS test is divided into independent testsessions. The MMS-PC models learn from the test results by supervisedclustering and regression.

[0082] Perceived Motion Energy

[0083] In an MPEG stream, there are two motion vectors in each macroblock of a B-frame for motion compensation, often referred as the“motion vector field” (MVF). Since the magnitude of a motion vectorreflects the motion velocity of a macro block, it can be used to computethe energy of motion at frame scale. Although the angle of a motionvector is not reliable to represent the motion direction of a macroblock, the spatial consistency of angles of motion vectors reflects theintensity of global motion. The spatial motion consistency can beobtained by calculating the percentage of dominant motion direction in awhole frame. The more consistent the angles are, the higher theintensity of global motion is. The a typical samples in a MVF usuallyresult in inaccurate energy accumulation, so the magnitudes of motionvector in MVF should be revised through a spatial filtering processfirst before computing perceived motion energy.

[0084] The spatial filter used in the described embodiment is a modifiedmedian filter. The elements in the filter's window at macro blockMB_(i,j) are denoted by Ω_(i,j) in MVF, where W_(s) is the width of thewindow. The filtered magnitude of the motion vector is computed by:$\begin{matrix}{{Mag}_{({i,j})} = \{ \begin{matrix}{{{{Mag}_{i,j}\quad {if}\quad {Mag}_{i,j}} \leq {{Max}\quad 4{th}\quad ( {Mag}_{k} )}}\quad} \\{{{Max}\quad 4{th}\quad ( {Mag}_{i,j} )\quad {if}\quad {Mag}_{i,j}} > {{Max}\quad 4{th}\quad ( {Mag}_{k} )}}\end{matrix} } & (1)\end{matrix}$

[0085] where (k ∈ Ω_(i,j)), and the function Max 4th(Mag_(k)) returnsthe fourth value in the descending sorted list of magnitude elementsΩ_(i,j) in the filter window.

[0086] Then the spatially filtered magnitudes at each macro blockposition (i,j) are averaged by a second filter. The filter adopts analpha-trimmed filter within a window, with the spatial size of W_(t) ².All of the magnitudes in the window are sorted first. After the valuesat two ends of the sorted list are trimmed, the rest of magnitudes areaveraged to form mixture energy MixEn_(i,j), which includes the energyof both object and camera motion, denoted by (2). $\begin{matrix}{{MixEn}_{i,j} = {\frac{1}{( {M - {2 \times \lfloor {\alpha \quad M} \rfloor \times W_{t}^{2}}} )}{\sum\limits_{m = {{\lfloor{\alpha \quad M}\rfloor} + 1}}^{M - {\lfloor{\alpha \quad M}\rfloor}}{{Mag}_{i,j}(m)}}}} & (2)\end{matrix}$

[0087] where M is the total number of magnitudes in the window, and └αM┘equals the largest integer not greater than αM; and Mag_(i,j)(m) is themagnitude's value in the sorted list. The trimming parameter α(0≦α≦0.5)controls the number of data samples excluded from the accumulatingcomputation. Then the average magnitude Mag(t) of motion vectors in thewhole frame after the above filtering is calculated as

Mag(t)=β×(ΣMixFEn _(i,j)(t)/N+ΣMixBEn _(i,j)(t)/N)/2  (3)

[0088] Where MixFEn_(i,j)(t) represents forward motion vectors andMixBEn_(i,j)(t) represents backward motion vectors. The definitions ofMixFEn_(i,j)(t) and MixBEn_(i,j)(t) are similar to MixEn_(i,j) in Eq.(2). In Eq.(3), N is the number of macro blocks in the frame and β isset to 4.54. The percentage of dominant motion direction α(t) is definedas $\begin{matrix}{{\alpha (t)} = \frac{\max ( {{{AH}( {t,k} )},{k \in \lbrack {1,n} \rbrack}} )}{\sum\limits_{k = 1}^{n}{{AH}( {t,k} )}}} & (4)\end{matrix}$

[0089] The angle in 2π is quantized into n angle ranges. Then number ofangles in each range is accumulated over the whole forward motionvectors to form an angle histogram with n bins, denoted by AH(t,k), k∈[1,n]. So max(AH(t,k)) is the dominant direction bin among all motiondirections. n is set 16 throughout the work.

[0090] The perceived motion energy (PME) of a B frame is computed asfollows:

PME(t)=Mag(t)×α(t)  (5)

[0091] The first item on the right side of Eq. 5 is the averagemagnitude of motion vectors within a frame, which is expected to reflectthe fact that dropping frames of low motion intensity is lessperceptible than dropping frames of high motion intensity. The seconditem α(t) represents the percentage of the dominant motion direction.For instance, α(t) will make the contribution of motion from a camerapan more significant to PME, because α(t) will be very large if a camerapanning exists. If other camera motions exist or dominant object motionsexist, α(t) will also be fairly large as well. This matches the factthat human eyes tend to track dominant motion in the scene. We definethe PME feature in a way that is expected to closely embodycharacteristics of human perception, and this will be proved by the highcorrelation between the learned MMS-PC models and individual testresults

[0092] Temporal Segmentation

[0093] As stated in the section above, the PME value is calculated foreach B frame for a given video as the first step of determining theimportance of each B-frame in term of its significance to perceivedjudder. The range of PME value is cut to [0, 200] as there is are veryfew PME values larger than 200. Now the PME value sequence PME(t) isused to represent the original video sequence. The next step is totemporally segment this sequence into successive segments, eachrepresented by a triangle model of motion acceleration and decelerationcycle. Before performing this segmentation, PME(t) is filtered by anaverage filter within a window of 5 frames. This filter smoothes the PMEvalue sequence from noises and makes the segmentation more accurate.Then, a model is used to segment the sequence into successive segmentsand represent each of the segments. In the illustrated example, themodel comprises a triangle model, although other models can be used.

[0094]FIG. 6 shows an example. The left bottom vertex of the trianglerepresents the start point of the segment and its PME value is zero. Theright bottom vertex of the triangle represents the end point of thesegment and its PME value is also zero. The top vertex of the trianglerepresents the maximum PME value of the segment. So for segment i, thetriangle model is represented by a triple (ts_(i),te_(i),PME_(i)), wherets_(i) is the start point, te_(i) is the end point, PME_(i) is the peakPME value of the segment, and PME(ts_(i))=PME(te_(i))=0. A specialtriangle model (ts_(i),te_(i),0) is used for successive zeros.

[0095] Use of the triangle model is inspired by the fact that the motionpattern of a typical scene is composed of a motion acceleration processand a following deceleration process. Accordingly, the left bottomvertex represents the start point of motion acceleration and the rightbottom vertex represents the end point of motion deceleration. Withinvideo sequences, this motion acceleration and deceleration pattern isrepeated over and over again. FIG. 6 clearly shows repeats of thetriangle pattern. Extensive experimental results have indicated that thetriangle model works well. Of course, it is possible that other modelscould be used, without departing from the spirit and scope of theclaimed subject matter.

[0096] To segment a sequence, in this embodiment, is to detect trianglepatterns in the PME feature of the sequence. The PME value of the startpoint and that of the end point of a segment are both zero. So a simplesearch process can be used to find the triangle patterns. However, whenmotion continues for a long time, the triangle can become less accurate.FIG. 7 shows an example of this situation. To deal with continuedmotion, a splitting process can be performed before the triangle patternsearch process. To split long continuous motion, splitting boundariescan first be found. For a particular point (t, PME(t)), if

PME(t)=min(PME(t−T), . . . , PME(t−i), . . . , PME(t+i), . . . ,PME(t+T)))

[0097] and PME(t+j)>0, j∈[−T,T]

[0098] then PME(t) is set 0. So (t, PME(t)) now becomes a spittingboundary. Typically T is set 100 as our statistics show that this valueobtains good trade-off between splitting long continuous motion andavoiding too many triangle patterns. That means some local minimums ofthe PME sequence are set as splitting boundaries. FIG. 8 shows thesplitting results of FIG. 7. The two designated blocks show twosplitting boundaries, which are local minimums of the original PMEsequence. As a result, the large triangle in FIG. 7 is split into threesmall triangles.

[0099] After segmenting a sequence by the triangle model, we needrepresentative features of a video segment to construct models that canpredict user satisfaction to frame dropping from these features. We haveexperimented with two representative features: the peak PME value andthe average PME value of a segment. The peak PME value of a videosegment is picked as the only representative feature because ourexperiment results show it is more representative than the average PMEvalue. The effectiveness of the representative feature of peak PME valuewill be presented in more detail in the model evaluation section.

[0100] MOS Test

[0101] In the video library utilized to describe this embodiment, thecoding type is: MPEG-1 352×288 25 fps CBR 1150 kbps, and the GOPstructure is IBBPBBPBBPBBPBBPBB. After dropping 50% and 100% B frames oforiginal sequence, the frame rates of the degraded two sequences, namedtest sequence 1 and test sequence 2, are 16.7 fps and 8.3 fpsrespectively. Although the video bit-stream with some frames skipped canbe decoded without much of a problem, the frame timing is changed. As aremedy, escape-coded frames can be used instead of skipping where Bframes are skipped. Thus frame timing (i.e. frame display timing) iskept unchanged during playback.

[0102] In this example, the video library includes 56 originalsequences. The total size of the library, including original sequencesand test sequences, is about 20 hours. Each original sequence is dividedinto successive segments logically, but not physically, using theabove-described triangle model. There are 7930 original segments and theaverage segment length is 73. As viewers can hardly distinguishdifferences if the test segment is too short, only segments longer than2 seconds are selected as test candidates. In total then, in thisexample, 2870 test candidates exist in the library. DCR (DegradationCategory Rating) was selected as the MOS test scheme. The DegradationCategory Rating implies that the test sequences are presented in pairs:the first stimulus presented is always the original sequence, while thesecond stimulus is the degraded sequence having a lesser frame rate. Thedegraded sequences in our test are test sequence 1 and test sequence 2respectively.

[0103] The subjects were asked to rate the impairment of test sequence 1and 2 in relation to the reference. Although many other factors mayaffect the rating, the testers are instructed to focus on impairmentcaused by motion judder. The following five-level scale for rating theimpairment caused by frame rate decrease is used: 5-Imperceptible,4-Perceptible but not annoying, 3-Slightly annoying, 2-Annoying, and1-Very annoying.

[0104] The whole test was divided into independent test sessions andeach session consisted of 160 segments. The average time of a sessionwas about one hour. Within a session, there are not two segments whosepeak PME values are the same, and each segment is randomly selected.Separate tests were performed for the MMS-PC model for the frame rates8.3 fps and 16.7 fps, respectively. The presentation order of thesegments in each session is also random. This randomness attempted toavoid viewer's bias. Twenty viewers attended the test and of a total of120 sessions, 80 sessions were directed to model learning and 40sessions were directed to model evaluation. Half of the sessions werefor learning and evaluating the MMS-PC model for frame rate 16.7 fps,and another half were for the frame rate 8.3 fps.

[0105] The picture rate of CRT monitor of our test PC is set 85 Hz. Tointerface MPEG video at a low frame rate and the PC display at a highframe rate, image frames have to be repeated at times instances wherethe original sequence has not be sampled. This frame repetition willcause motion judder in viewing both original segments and degradedsegments. However, since we use the Degradation Category Rating MOS testscheme, testers will rate the impairment of degraded segments caused byframe dropping in relation to the original segments Other environmentssettings are according to the ITU Recommendation P.910.

[0106] The MMS-PC Models

[0107] To predict user satisfaction to frame dropping in a videosegment, we need to build a model that maps the peak PME value of thesegment to a mean MOS score. We have obtained the model through a largenumber of MOS tests. The mean MOS score for each PME value is obtainedby averaging all scores given to the segments with this PME value. TheMean MOS Score-PME Value charts for the frame rates 16.7 fps and 8.3 fpsare shown in FIGS. 9 and 10 respectively. Each chart is the predictionmodel that maps peak PME values to mean MOS scores given a frame rate.To simplify the prediction models and to reflect the human preceivedsensitivity to different ranges of motion judder, supervised clusteringwas used to cluster the points in FIGS. 9 and 10 into classesrespectively. A PME class includes a range of PME values, and we use themean MOS scores of the range of PME values as the mean MOS score of aPME class. For a frame rate of 16.7 fps, four classes are clustered, andfor a frame rate 8.3 fps, five classes are clustered. In FIGS. 9 and 10,each indicated block represents a class. A regressed horizontal line(designated “class line”) is used to represent each class. The value ofthe regressed horizontal line is the mean MOS scores of each class. Sothe combination of the regressed horizontal lines is the learned model.The model for a frame rate of 16.7 fps is designated “model 1” and themodel for frame rate 8.3 fps is designated “model 2”. The classboundaries and mean MOS score of each class appear in Tables 1 and 2,respectively. FIG. 11 shows the models in another way. TABLE 1 Mean MOSScores and Class Boundaries for Model 1 Class 1 Class 2 Class 3 Class 4Mean MOS Score 4.717034 4.510488 4.386745 4.594994 Class Boundaries 0-3031-97 98-142 143-200

[0108] TABLE 2 Mean MOS Scores and Class Boundaries for Model 2 Class 1Class 2 Class 3 Class 4 Class 5 Mean MOS 3.960221 3.779262 3.3427553.109614 3.629562 Score Class 0-12 13-30 31-97 98-142 143-200 Boundaries

[0109] The human eyes can compensate the scene motion by SPEM. However,the maximum speed of SPEM is about 20 to 30 deg/sec. So, when the motionis faster than this speed, viewers will become less sensitive to framerate decreases and the scores will increase. In our viewing condition,the viewing distance is about 8H (where H indicates the picture height),and the maximum tracking speed of the human eye is between 20-30deg/sec. This corresponds to a speed of 26-40 pixels/frame. If all ofthe motion vectors have the same direction, the corresponding PME valueis between 118-191. As a result, the mean score of class 4 is largerthan that of class 3 in model 1, and the mean score of class 5 is largerthan that of class 4 in model 2.

[0110] Based on 20 separate test results, we evaluated the performanceof the two MMS-PC models by Pearson correlation coefficient. The averagePearson correlation coefficient between the prediction by model 1 andthe real test results is 0.9, and the average Pearson correlationcoefficient between the prediction by model 2 and the real test resultsis 0.95. Such a high correlation between the predictions by the MMS-PCmodels and real test results indicates that PME feature closely embodiescharacteristics of human perception to frame dropping which makes theMMS-PC models valid.

[0111] A Priority-Based Delivery Model

[0112] With the MMS-PC models, we developed a priority-based deliverymodel that describes which frames are more important to humanperception. In the described priority-based delivery model, I and Pframes are given the highest priority because of decodinginterdependency. B frames are assigned priority levels according totheir temporal positions and the peak PME values of their segments.

[0113] If half of a segment's B frames are dropped, the first of twosuccessive B frames is always dropped first. As a result, the first ofany two successive B frames is assigned lower priority levels than the Bframes that follow. Dropping one P frame will severely degrade videoquality. In various embodiments, we focus on B frame dropping and assignthe same priority level to all P frames within a GOP.

[0114]FIG. 12 illustrates aspects of a priority-based delivery model.The two halves of B frames of any segment are designated as either a“low-priority-half” or a “high-priority-half”. The priority levels ofthe low-priority-half and high-priority-half are determined by thedegraded quality if they are dropped. The low-priority-half andhigh-priority-half can be assigned a class according to the peak PMEvalue of the segment. As there are four classes in MMS-PC model 1 andfive classes in MMS-PC model 2, four priority levels exist for thelow-priority-half and five priority levels exist for thehigh-priority-half respectively. So, in total, eleven priority levelsexist in the priority-based delivery model.

[0115] Table 3 below describes the mapping between priority levels andthe MMS-PC model class. TABLE 3 Mapping Between Priority Level andMMS-PC Model Class Priority level 11 10 9 8 7 6 5 4 3 2 1 Frame type I PB B B B B B B B B MMS-PC Model 2 2 2 2 2 1 1 1 1 Class 4 3 5 2 1 3 2 4 1

[0116] For example, the peak PME value of segment i is 7, so thelow-priority-half is assigned priority level 1 and thehigh-priority-half is assigned priority level according to Tables 1 and2. This is done in the following manner. Notice that the PME value of 7corresponds to the first class in each of FIGS. 9 and 10. Notice alsothat the bottom row of Table 3 contains entries for each of the classesin each of the models. Specifically, for model 1 there are classes 1-4and for model 2 there are classes 1-5. Since the above PME valuecorresponds to class 1 in each of the models, class 1 can be mapped to apriority level of 1 (for model 1) and 5 (for model 2). The prioritylevels of the low-priority-half and the high-priority-half of segmenti−1 and i+1 are also determined by their peak PME values.

[0117] In FIG. 12, there are three segments within the delivery window.Under the constraints of available bandwidth, the frames with higherpriority within a delivery window will be delivered. That is effectivelyequivalent to dropping frames with lower priorities. In this example,the frames of priorities higher than five are delivered, and the framesof priorities lower than six are dropped. In this priority-baseddelivery model, frames of lower importance to a viewer's perceivedquality have lower priorities and frames of higher importance theviewer's perceived quality have higher priorities.

[0118] Perceptual Temporal Video Adaptation

[0119] Available bandwidth typically varies throughout delivery of astream. The described perceptual temporal video adaptation schemeactively drops B frames and optimally utilizes available bandwidth andclient buffer space such that video quality is improved and smoothed. Anunderlying principle behind active frame dropping is to actively dropunimportant frames and thus save bandwidth resources to the clientbuffer when bandwidth is stable. Thus, important frames are protectedand the client buffer can be used as bandwidth decreases. The heart ofactive frame dropping is the same as active layer dropping in the abovedescribed FGS video streaming. The forward-shifting technique andcontent-aware decision making are utilized in the temporal videoadaptation scheme.

[0120] Active frame dropping thus bridges the gap between thepriority-based delivery model and optimal utilization of availablebandwidth and client buffer space. When the bandwidth is stable, theclient buffer is typically not full, and current B frames are of lowpriorities. These low priority B frames are then actively dropped toforward-shift the whole stream to the client buffer. When availablebandwidth decreases and current frames are of high priorities, theclient buffer is used so that the bandwidth decrease is hidden from thedecoder and the video quality is improved and smoothed. As soon asbandwidth recovers again, the active frame dropping can be restarted toprepare for next bandwidth decrease. As a result, important B frames areprotected and video quality is smoothed. In addition, active B framedropping does not typically result in any start-delay.

[0121]FIG. 13 shows a video streaming system that utilizes the Internet.The system includes video data 1300 that is received by a streamingserver 1302 for streaming over the Internet 1306. The streaming serverworks under the influence of a rate controller 1304 which can controlthe rate at which video is streamed. A client 1308 receives streamedvideo over the Internet 1306. The rate controller 1304 can desirablyreceive feedback from the Internet 1306 and/or the client 1308. Server1302 outputs the MPEG-4 base layer and the enhancement layer streams.The feedback from the client and/or the Internet is used to estimate theavailable bandwidth. Thereby, the rate at which the enhancement layer issent can be dynamically controlled by the rate controller 1304 accordingto network conditions.

[0122] The embodiment described below focuses on the rate controller1304. In the illustrated and described embodiment, rate controller 1304is implemented as a close-loop, feedback rate control system.

[0123] Exemplary Rate Controller

[0124]FIG. 14 shows an exemplary rate controller 1304 in accordance oneembodiment in additional detail. The illustrated rate controllercomprises a state machine 1400, a bandwidth allocation module 1402, anda virtual buffer model 1404. The heart of the framework is the statemachine 1400, which implements perceptual rate adaptation through modeand state transitions. Mode transitions are to be adapted to long-termbandwidth or bit-rate variations, and state transitions are to beadapted to short-term bandwidth or bit-rate variations.

[0125] Virtual buffer model 1404 is introduced to describe the dynamicalbuffer filling and draining process. A constant frame consumption rateis assumed in the buffer model. The buffer status is feedback to thestate machine 1400 for smoothing. The bandwidth allocation module 1402allocates bandwidth to delivered frames given the state and mode.

[0126] The following notations are utilized in the description thatfollows:

[0127] W₀: constant frame consumption rate.

[0128] W(k): sliding window size at kth time slot.

[0129] S₀: client buffer capacity.

[0130] S(k): number of buffered frames after kth time slot.

[0131] Q(k): the first streamed frame at kth time slot.

[0132] M: number of priority levels, M=11

[0133] Mode(k) mode of state machine, Mode (k)∈{1,2, . . . , 11}.

[0134] State(k): state of state machine. State(k)∈{1,2,3}

[0135] B(k): estimated available bandwidth.

[0136] AB(k): average bandwidth until time slot${k:{{AB}(k)}} = {( {\sum\limits_{k = 1}^{k}{B(k)}} )/k}$

[0137] R_(j) ^(i): for jth frame, its priority level is i₀ and its sizeis L₀, if i=i₀, then R_(j) ^(i)=L₀, else R_(j) ^(i)=0

[0138] L_(k) ^(i)(j) for jth frame at kth time slot, its priority levelis i₀ and its size is L₀, if i=i₀, then L_(k) ^(i)(j) L₀, else L_(k)^(i)(j)=0

[0139] T(n, k): number of frames that can be transmitted under mode nand available bandwidth. It is determined by${B(k)} = {\sum\limits_{i = n}^{i = M}{\sum\limits_{j = {Q{(k)}}}^{j = {{Q{(k)}} + {T{({n,k})}} - 1}}(j)}}$

[0140] E_(k) ^(i) bit size of priority i of current W(k) frames, so$E_{k}^{i}{\sum\limits_{j = {Q{(k)}}}^{j = {{Q{(k)}} + {W{(k)}} - 1}}{L_{k}^{i}(j)}}$

[0141] QE_(k) ^(i): bit size of priority i of current W₀ frames, so${QE}_{k}^{i}{\sum\limits_{j = {Q{(k)}}}^{j = {{Q{(k)}} + W_{0} - 1}}{L_{k}^{i}(j)}}$

[0142] P(n, k): bit size of frames that are protected in mode n with thewindow size W${{W(k)}:{P( {n,k} )}} = {\sum\limits_{i = n}^{i = M}E_{k}^{i}}$

[0143] QP(n,k): bit size of frames that would be protected if the windowsize is${W_{0}:{{QP}( {n,k} )}} = {\sum\limits_{i = n}^{i = M}{QE}_{k}^{i}}$

[0144] D(k): a priori information, including R_(j) ^(i), L_(k) ^(i)(j),T(n,k), E_(k) ^(i), QE_(k) ^(i), P(n,k), QP(n,k), N, TP(i), etc.

[0145] N: number of total frames of the sequence.

[0146] TP(i): the average size of priority level i of the sequence:${{TP}(i)} = {( {\sum\limits_{j = 1}^{N}R_{J}^{i}} )/N}$

[0147] The Virtual Buffer Model

[0148] In the illustrated embodiment, virtual buffer model 1404 ismodeled by a frame FIFO (First In, First Out) queue. The frame rateentering the buffer is W(k), the constant frame consumption rate is W₀,and S(k−1) is the number of frames buffered after time slot k−1.Accordingly, the filling rate is W(k)−W₀ and the draining rate isW₀−W(k). The following equation holds:

S(k)=S(k−1)+(W(k)−W ₀)

[0149] To avoid underflow: W(k)≧W₀−S(k−1)

[0150] To avoid overflow: W(k)≦S₀+W₀−S(k−1)

[0151] In this buffer model, underflow and overflow are in the sense offrame number, not in the sense of bit size. However, bit size and framenumber are interchangeable. To make the following discussion simple, itis assumed that the client buffer size is large enough for anysuccessive S₀ frames.

[0152] The State Machine

[0153] In the illustrated and described embodiment, state machine 1400implements perceptual rate adaptation by mode and state transitions. Amode defines what priority levels are protected. The priority levelsthat are higher or equal to the current mode are protected. If bandwidthor stream bit-rate changes in the sense of long-term, mode transits todecrease or increase the priority level to be protected. Mode increasesmean that less priority levels are protected and mode decreases meanthat more priority levels are protected.

[0154] A state determines current delivery window size. Within eachmode, state transits to absorb short-term bandwidth or stream bit-ratevariations When bandwidth decreases, buffered frames are used inaddition to transmitted frames to provide a constant frame rate. Whenbandwidth recovers, unimportant frames are is actively dropped and moreframes are forward-shifted to fill the client buffer.

[0155] State Transition FIG. 15 is a diagram that illustrates state andmode transitions in accordance with one embodiment. In this example,there are three states in the state machine 1400, namely, State 1, State2, and State 3. A pair of window sizes (max(W₀−S(k−1),0), S₀+W₀−S(k−1))is declared to the state machine for a given time slot k. The left itemof the pair declares the minimum frame number that the client bufferneeds to avoid underflow. Similarly the right item declares the maximumframe number that the client buffer can accept to avoid overflow. Thestate machine 1400 can adjust its sending frame rate W(k), called thesliding window here, from max(W₀−S(k−1),0) to S₀+W₀−S(k−1). To make therate controller 1304 more sustainable to bandwidth decreases and streambit-rate increases, it is expected to keep S(k−1) to be S₀ and W(k) tobe W₀. This is the stable state, called State 1, which the state machinewill try to keep.

[0156] When bandwidth is insufficient for W₀ frames, W(k) will bedecreased and the state transits to State 2. In state 2, the buffer isused as additional bandwidth and, thus, insufficient bandwidth is hiddenfrom the decoder. When the bandwidth recovers, the state transits toState 3. In this state, the window size W(k) is maximized by activelydropping the frames that are not protected. So the buffer is filledagain at this state. State 3 remains until the buffered frames are equalto s, again. By sliding the window, short-term bandwidth and streambit-rate fluctuation is absorbed to provide a constant frame rate.Formula descriptions are as follows.

[0157] State 1: W(k)=W₀, S(k)=S₀ and B(k)≧QP(n,k)

[0158] State 2: P(n, k)=B(k), B(k)<QP(n,k), W(k)<W₀, and S(k)<S₀

[0159] In state 2, all frames of priority lower than n are dropped.

[0160] State 3: P(n, k)=B(k), B(k)≧QP(n, k), W(k)≧W₀, and S(k)<S₀

[0161] In state 3, active dropping is performed and all frames ofpriority lower than n are dropped.

[0162] The state transmission conditions in FIG. 11 are as follows:

[0163] T_(n) ⁰: B(k+1)≧QP(n,k+1)

[0164] Note: Bandwidth is sufficient to keep current state and mode.

[0165] T_(n) ¹: B(k+1)<QP(n, k+1)

[0166] Note: Bandwidth insufficient, so decrease the sliding window.

[0167] T_(n) ²: B(k+1)<QP(n, k+1) and S(k)+T(n, k+1)≧W₀

[0168] Note: Bandwidth still insufficient, but client buffer notunderflow

[0169] T_(n) ³: B(k+1)≧QP(n, k+1) and S(k)+T(n, k+1)≧W₀

[0170] Note: Bandwidth increase, so actively drop unprotected frames.

[0171] T_(n) ⁴: B(k+1)<QP(n, k+1) and S(k)+T(n, k+1)≧W₀

[0172] Note: Bandwidth decreases, decrease the sliding window.

[0173] T_(n) ⁵: B(k+1)≧QP(n, k+1) and S(k)<S₀

[0174] Note: Fill buffer by active dropping when bandwidth sufficient

[0175] T_(n) ⁶: B(k+1)≧QP(n, k+1) and S(k)=S₀

[0176] Note: Client buffer full, return to stable state.

[0177] Mode Transition

[0178] The mode of the state machine is decided by so-called a prioriinformation of the stream and average bandwidth. The current mode i isdetermined by${\sum\limits_{j = {i - 1}}^{j = M}\quad {{TP}(j)}} > {{AB}(k)} \geq {\sum\limits_{j = i}^{j = M}{{TP}(j)}}$

[0179] So the mode of state machine will not change frequently and videoquality is smooth. If long-term bandwidth variation occurs, modetransits to adapt to the variation. However, when the transmission offrames of priority levels higher than or equal to n cannot be guaranteedand constant frame rate consumption cannot be guaranteed, the statemachine 1400 will immediately decrease the priority levels that areprotected. As shown in FIG. 15, the condition of transition from mode nto mode n+i is

[0180] D_(n+i) ^(n): S(k)+T(n, k+1)<W₀

[0181] where i satisfies: QP(n+i−1,k+1)>B(k)≧QP(n+i,k+1).

[0182] Bandwidth Allocation

[0183] Given the current mode Mode(k) and state State(k), availablebandwidth is allocated to delivered frames. For the three states, theframes whose priorities are higher or equal to Mode(k) are alltransmitted. Specifically, for state 1, extra bandwidthB(k)−QP(Mode(k),k) is allocated to the frames with the highest priority.The frames are delivered in a frame-by-frame order along the time axisas shown in FIG. 12. The reasons are twofold. First, the newlytransmitted frames may be consumed during the current time slot.Therefore, buffering efficiency can be increased. Second, theframe-by-frame order does not need any synchronization at the clientside and simplifies system realization.

[0184] Simulation Examples

[0185] A two-state Markov model, as proposed by Yee and Weldon, in“Evaluation of the performance of error correcting codes on a Gilbertchannel”, IEEE Trans. Comm. Vol. 43, No. 8, pps. 2316-2323 (1995) isused to simulate packet losses in the Internet channel.

[0186] In the simulation described below, a simplified model is used, sothe average packet loss rate is $\begin{matrix}{p = \frac{( {1 - \alpha} )}{1 - \alpha + 1 - \beta}} & (6)\end{matrix}$

[0187] Details of the model can be found in Yee and Weldon referencedabove. Assuming that the transport protocol is TCP-friendly, anequation-based way to estimate available bandwidth is used, such as theone described in Floyd, et al., “Equation-Based Congestion Control forUnicast Applications”, SIGCOMM 2000, February 2000. The equation is$\begin{matrix}{T = \frac{s}{{R\sqrt{\frac{2p}{3}}} + {{t_{RTO}( {3\sqrt{\frac{3p}{8}}} )}{p( {1 + {32p^{2}}} )}}}} & (7)\end{matrix}$

[0188] Where s is the size of the packet, R is the round-trip time,t_(RTO) is the timeout value, and p is the packet loss rate. Otherdetails are described in Floyd et al. referenced above. The bandwidthcurve shown in FIG. 16 and FIG. 17 are chosen from realizations of thestochastic process characterized by the model.

[0189] The coding type and the GOP structure of the test sequences arethe same as the video library. Two sequences, S₁ and S₂, are selectedfrom our video library. The TP(i) of the two sequences is shown in Table4. Other parameter settings are in Table 5. TABLE 4 TP(i) of the twosequences (bytes) i 1 2 3 4 5 6 7 8 9 10 11 S1 132150 0 97500 1005057000 75750 0 99150 10050 32950 255600 S2 67350 0 154650 14100 4320025050 0 156150 14250 339300 252450

[0190] Table 4—TP(i) of the two sequences (bytes) TABLE 5 ParameterSettings W₀: 150 S₀: 300 Timeslot: 6 seconds (150 frames) S(0): 50 R:0.011 sec β: 0.911 t_(RTO): 0.1 sec α: 0.999

[0191] Simulation Results

[0192]FIGS. 18 and 19 show behaviors of the state machine under thebandwidth curve 1. FIG. 18a and FIG. 19a show the mode transitions ofthe state machine. It can be seen that frames of priority levels higherthan 4 are all protected. At the same time, the mode curve is a straightline, that is, the video quality is smooth throughout the wholedelivered sequence. FIG. 18b and FIG. 19b show the state transitioncurves. When bandwidth decreases, the state transits to State 2 suchthat less frames are sent and the client buffer is used as bandwidth.When bandwidth recovers, the state transits to State 3 to forward-shiftmore frames and the client buffer is filled again. In normal conditions,the state remains in stable State 1. FIG. 18c and FIG. 19c show thewindow size, which changes according to the current state. In State 1,the window size is W₀. In State 2, the window size is smaller than W₀,and in State 3, the window size is larger than W₀. FIG. 18d and FIG. 19dshow client buffer fullness. Notice that the fullness changes accordingto the window size. Similarly, FIGS. 20 and 21 show behaviors of thestate machine under the bandwidth curve 2.

[0193] From these simulations, we can see that frames of more importanceto human perception are protected in case of sharp bandwidth decrease,and the overall video quality is improved and smoothed. The proposedrate controller can realize active frame dropping and absorb short-termbandwidth fluctuations through state transitions. As a result, a prioriinformation about the video stream, available bandwidth, and clientbuffer are optimally utilized.

[0194] Conclusion

[0195] In the embodiments described above, MMS-PC models are presentedthat can predict user satisfaction with regard to frame dropping bylow-level features. A priority-based delivery model is constructed basedon the models. A perceptual video adaptation scheme is then developed,which actively drops B frames according to the priority-based delivery,and optimally utilizes available bandwidth and client buffers. A statemachine realizes the adaptation scheme. State transitions or modetransitions of the state machine absorb short-term or long-termbandwidth and bit-rate variations. The result of the above systems andmethods can be improved video quality and smoothness.

[0196] Although the invention has been described in language specific tostructural features and/or methodological steps, it is to be understoodthat the invention defined in the appended claims is not necessarilylimited to the specific features or steps described. Rather, thespecific features and steps are disclosed as preferred forms ofimplementing the claimed invention.

1. A method comprising: providing a streaming server configured tostream data; and at the streaming server, dropping at least portions ofan enhancement layer sufficient to forward-shift following portions ofthe enhancement layer by an amount.
 2. The method of claim 1 furthercomprising: determining whether portions of the enhancement layer areimportant; and making a decision to drop enhancement layer portionsbased, at least in part, on whether the portions are important.
 3. Themethod of claim 1 further comprising: determining a client side buffersize; and making a decision to drop enhancement layer portions based, atleast in part, on the client side buffer size.
 4. The method of claim 1,wherein the enhancement layer comprises individual blocks having blocksizes, and said amount by which said other portions are shiftedcomprises at least one block size.
 5. The method of claim 4, whereinindividual block sizes contain the same amount of bits.
 6. The method ofclaim 1, wherein dropping said portions of the enhancement layercomprise dropping portions that are less important than portions thatare not dropped.
 7. The method of claim 1, wherein the enhancement layercomprises part of an FGS stream.
 8. The method of claim 1 furthercomprising if the enhancement layer contains important content andbandwidth is insufficient, delaying transmission of portions of theenhancement layer that contain the important content.
 9. The method ofclaim 1, wherein said act of dropping comprises dropping at least oneenhancement layer.
 10. The method of claim 1, wherein said act ofdropping comprises dropping at least one frame.
 11. The method of claim1, wherein said act of dropping comprises dropping at least one block.12. One or more computer-readable media having computer-readableinstructions thereon which, when executed by one or more processors,cause the one or more processors to implement the method of claim
 1. 13.A method comprising: determining an importance of content that is to bestreamed to a client by a steaming server; and making a decision toforward-shift following portions of the content to the client based onthe determined importance of the content.
 14. The method of claim 13,wherein said act of making a decision comprises deciding, when networkbandwidth is more available, to more actively forward shift content thatis determined to be important than content that is determined to be notas important.
 15. The method of claim 13, wherein said act of making adecision comprises deciding, when network bandwidth is less available,to less actively forward shift content that is determined to be lessimportant.
 16. The method of claim 13, wherein said act of making adecision comprises: deciding, when network bandwidth is more available,to more actively forward shift content that is determined to beimportant than content that is determined to be not as important; anddeciding, when network bandwidth is less available, to less activelyforward shift content that is determined to be less important.
 17. Amethod comprising: first dropping at least portions of an enhancementlayer sufficient to forward-shift following portions of the enhancementlayer by an amount; determining whether a decrease in availablebandwidth has occurred; responsive to said determining, determiningwhether the enhancement layer contains important content, and: if theenhancement layer does not contain important content, second droppingadditional portions of the enhancement layer in an attempt to preserve aforward-shifted enhancement layer, otherwise, if the enhancement layerdoes contain important content, delaying transmission of portions of theenhancement layer so that the delayed portions can be transmitted whenthe available bandwidth increases.
 18. The method of claim 17, whereinsaid first dropping occurs in the absence of a decrease in availablebandwidth.
 19. The method of claim 17, wherein the enhancement layercomprises individual blocks having block sizes and said amount comprisesat least one block size.
 20. The method of claim 19, wherein individualblock sizes contain the same amount of bits.
 21. The method of claim 17,wherein the enhancement layer comprises part of an FGS stream.
 22. Themethod of claim 17, wherein said act of first dropping occurs prior tostreaming any enhancement layer portions.
 23. The method of claim 17further comprising: determining that the available bandwidth hasincreased; responsive to determining that the available bandwidth hasincreased, transmitting the delayed portions of the enhancement layer;and after said transmitting of the delayed portions, dropping additionalportions of the enhancement layer sufficient to forward shift followingportions of the enhancement layer by an amount.
 24. The method of claim17, wherein said acts of dropping comprise dropping one or moreenhancement layers.
 25. The method of claim 17, wherein said acts ofdropping comprise dropping one or more frames.
 26. The method of claim17, wherein said acts of dropping comprise dropping one or more blocks.27. One or more computer-readable media having computer-readableinstructions thereon which, when executed by one or more processors,cause the one or more processors to implement the method of claim 17.28. A method comprising: actively dropping portions of an enhancementlayer when available bandwidth is constant to effectively forward shiftother portions of the enhancement layer; analyzing content of theenhancement layer to determine its relative importance; and if contentis determined to not be important, dropping associated portions of theenhancement layer, otherwise, delaying transmission of associatedportions of the enhancement layer.
 29. The method of claim 28, whereinsaid act of analyzing is performed responsive to a decrease in availablebandwidth.
 30. The method of claim 28, wherein said enhancement layercomprises part of an FGS stream.
 31. The method of claim 28, whereinsaid dropping of the enhancement layer portions comprises dropping oneor more enhancement layers.
 32. The method of claim 28, wherein saiddropping of the enhancement layer portions comprises dropping one ormore frames.
 33. The method of claim 28, wherein said dropping of theenhancement layer portions comprises dropping one or more blocks. 34.One or more computer-readable media having computer-readableinstructions thereon which, when executed by one or more processors,cause the one or more processors to implement the method of claim 28.35. A streaming server comprising: one or more processors; memory; andsoftware code embodiment in the memory which, when executed by the oneor more processors, cause the one or more processors to drop at leastportions of an enhancement layer sufficient to forward-shift followingportions of the enhancement layer by an amount.
 36. The streaming serverof claim 35, wherein the software code causes the one or more processorsto: determine whether portions of the enhancement layer are important;and make a decision to drop enhancement layer portions based, at leastin part, on whether the portions are important.
 37. The streaming serverof claim 35, wherein the software code causes the one or more processorsto: determine a client side buffer size; and make a decision to dropenhancement layer portions based, at least in part, on the client sidebuffer size.
 38. The streaming server of claim 35, wherein the softwarecode causes the one or more processors to drop portions of theenhancement layer that are less important than portions that are notdropped.
 39. A streaming server comprising: one or more processors;memory; software code embodiment in the memory which, when executed bythe one or more processors, cause the one or more processors to: firstdrop at least portions of an enhancement layer sufficient toforward-shift following portions of the enhancement layer by an amount;determine whether a decrease in available bandwidth has occurred;responsive to determining that a decrease in available bandwidth hasoccurred, determine whether the enhancement layer contains importantcontent, and: if the enhancement layer does not contain importantcontent, second drop additional portions of the enhancement layer in anattempt to preserve a forward-shifted enhancement layer, otherwise, ifthe enhancement layer does contain important content, delay transmissionof portions of the enhancement layer so that the delayed portions can betransmitted when the available bandwidth increases.
 40. The streamingserver of claim 39, wherein the software code causes the one or moreprocessors to first drop said portions in the absence of a decrease inavailable bandwidth.
 41. The streaming server of claim 39, wherein thesoftware code causes the one or more processors to first drop saidportions prior to streaming any enhancement layer portions.
 42. Thestreaming server of claim 39, wherein the software code causes the oneor more processors to: determine that the available bandwidth hasincreased; responsive to determining that the available bandwidth hasincreased, transmit the delayed portions of the enhancement layer; andafter transmitting the delayed portions, drop additional portions of theenhancement layer sufficient to forward shift other portions of theenhancement layer by an amount.
 43. The streaming server of claim 39,wherein the software code causes the one or more processors to drop oneor more enhancement layers.
 44. The streaming server of claim 39,wherein the software code causes the one or more processors to drop oneor more frames.
 45. The streaming server of claim 39, wherein thesoftware code causes the one or more processors to drop one or moreblocks.
 46. One or more computer-readable media having computer-readableinstructions thereon which, when executed by one or more processors,cause the processors to operate a streaming server by dropping at leastportions of an enhancement layer comprising part of an FGS streamsufficient to forward-shift other portions of the enhancement layer byan amount.
 47. The one or more computer-readable media of claim 46,wherein the computer-readable instructions cause the processors to dropportions that are less important than other portions.
 48. The one ormore computer-readable media of claim 46, wherein the computer-readableinstructions cause the processors to determine whether the enhancementlayer contains important content; and if the enhancement layer does notcontain important content, drop additional portions of the enhancementlayer in an attempt to preserve a forward-shifted enhancement layer; andif the enhancement layer contains important content, delay transmissionof portions of the enhancement layer that contain the important content.49. The one or more computer-readable media of claim 46, wherein thecomputer-readable instructions cause the processors to drop one or moreenhancement layers.
 50. The one or more computer-readable media of claim46, wherein the computer-readable instructions cause the processors todrop one or more frames.
 51. The one or more computer-readable media ofclaim 46, wherein the computer-readable instructions cause theprocessors to drop one or more block.
 52. One or more computer-readablemedia having computer-readable instructions thereon which, when executedby one or more processors, cause the processors to: first drop at leastportions of an enhancement layer comprising part of an FGS streamsufficient to forward-shift other portions of the enhancement layer byan amount; determine whether a decrease in available bandwidth hasoccurred; responsive to determining that a decrease in availablebandwidth has occurred, determine whether the enhancement layer containsimportant content, and: if the enhancement layer does not containimportant content, second drop additional portions of the enhancementlayer in an attempt to preserve a forward-shifted enhancement layer,otherwise, if the enhancement layer does contain important content,delay transmission of portions of the enhancement layer so that thedelayed portions can be transmitted when the available bandwidthincreases; determine that the available bandwidth has increasedresponsive to determining that the available bandwidth has increased,transmit the delayed portions of the enhancement layer; and aftertransmitting the delayed portions, drop additional portions of theenhancement layer sufficient to forward shift other portions of theenhancement layer by an amount.
 53. The one or more computer-readablemedia of claim 52, wherein the computer-readable instructions cause theprocessors to first drop said enhancement layer portions in the absenceof a decrease in available bandwidth.
 54. The one or morecomputer-readable media of claim 52, wherein the computer-readableinstructions cause the processors to first drop said enhancement layerportions prior to streaming any enhancement layer portions.
 55. One ormore computer-readable media having computer-readable instructionsthereon which, when executed by one or more processors, cause theprocessors to: actively drop portions of an enhancement layer comprisingpart of an FGS stream when available bandwidth is constant toeffectively forward shift following portions of the enhancement layer;analyze content of the enhancement layer to determine its relativeimportance; and if content is determined to not be important, dropassociated portions of the enhancement layer, otherwise, delaytransmission of associated portions of the enhancement layer.
 56. Theone or more computer-readable media of claim 55, wherein thecomputer-readable instructions cause the processors to analyze saidcontent responsive to a decrease in available bandwidth.
 57. A methodcomprising: providing a streaming server configured to stream data to atleast one client, said data comprising one or more enhancement layers;determining whether portions of the one or more enhancement layers areimportant by taking into account motion that is embodied in saidportions; making a decision to drop enhancement layer portions based, atleast in part, on whether the portions are important.
 58. The method ofclaim 57, wherein said act of determining comprises: computing perceivedmotion energy (PME) values for multiple B frames of a video stream;representing the video stream as a PME value sequence; segmenting thePME value sequence into individual segments, each segment correspondingto multiple frames; within each segment, assigning one or more prioritylevels; and wherein said act of making a decision comprises droppingframes having lower priorities when available bandwidth decreases. 59.The method of claim 58, wherein said computing comprises taking theproduct of an average of motion vector magnitudes within a frame and apercentage of dominant motion direction.
 60. The method of claim 58,wherein the PME values reflect a degree of perceived motion judder if aframe is dropped.
 61. The method of claim 58, wherein the multipleframes comprise B frames.
 62. The method of claim 58, wherein saidsegmenting comprises using a model to detect patterns in the PME valuesequence so that the detected patterns can be segmented into individualsegments.
 63. The method of claim 62, wherein said model comprises atriangle model.
 64. A method comprising: computing perceived motionenergy (PME) values for multiple frames of a video stream; representingthe video stream as a PME value sequence; segmenting the PME valuesequence into individual segments, each segment corresponding tomultiple frames; within each segment, assigning one or more prioritylevels; and dropping frames having lower priorities when availablebandwidth decreases.
 65. The method of claim 64, wherein said computingcomprises taking the product of an average of motion vector magnitudeswithin a frame and a percentage of dominant motion direction.
 66. Themethod of claim 64, wherein the PME values reflect a degree of perceivedmotion judder if a frame is dropped.
 67. The method of claim 64, whereinthe multiple frames comprise B frames.
 68. The method of claim 64,wherein said segmenting comprises using a model to detect patterns inthe PME value sequence so that the detected patterns can be segmentedinto individual segments.
 69. The method of claim 68, wherein said modelcomprises a triangle model.
 70. The method of claim 69, wherein leftbottom vertices of the model represent starting points of individualsegments, right bottom vertices of the model represent end points ofindividual segments, and top vertices of the model represent maximum PMEvalues of individual segments.
 71. The method of claim 64, wherein saidsegmenting comprises, prior to using the model, splitting one or moreportions of the PME value sequence that correspond to continuous motionby defining splitting boundaries relative to the PME value sequence. 72.The method of claim 64, wherein said assigning of the priority levelstakes place by considering degraded quality if frames are dropped. 73.The method of claim 64 further comprising dropping frames with lowerpriority levels when bandwidth is constant to forward shift frames thatare not dropped.
 74. The method of claim 64, wherein said assigning ofthe priority levels comprises assigning individual B frame portions aclass in accordance with the peak PME value of its associated segment,wherein any one class is associated with one or more models that providea mapping between classes and individual priority levels.
 75. A methodcomprising: providing one or more models that associate perceived motionenergy (PME) values for video segments with individual classes, eachclass being associated with a score that pertains to human-perceivedimpairment caused by frame rate decreases; assigning individual videosegments a class in accordance with a segment's peak PME value; mappingthe assigned class to an individual priority level; and using thepriority levels to ascertain one or more frames within a video segmentto drop.
 76. The method of claim 75, wherein providing one or moremodels comprises providing multiple models each of which beingassociated with a different frame rate.
 77. The method of claim 75,wherein said using comprises dropping frames having lower prioritylevels than other frames having higher priorities.
 78. The method ofclaim 75, wherein said using comprises dropping frames having lowerpriority levels than other frames having higher priorities whenavailable bandwidth is stable.
 79. The method of claim 75, wherein saidone or more frames comprise B-frames.
 80. A system comprising: a ratecontroller comprising: a state machine for implementing perceptual rateadaptation through mode and state transitions; a virtual buffer modelcommunicatively coupled with and providing feedback to the state machinefor describing dynamical buffer filling and draining processes; and abandwidth allocation module communicatively coupled with the statemachine for allocating bandwidth to frames given the state and mode ofthe state machine.
 81. The system of claim 80, wherein said framescomprise B frames.
 82. The system of claim 80, wherein the virtualbuffer model comprises a frame FIFO queue.
 83. The system of claim 80,wherein the virtual buffer model comprises: an input for a sending framerate associated with a sliding window size; an output associated with aconstant frame consumption rate; and a feedback output provided to thestate machine and associated with a number of buffered frames after agiven time slot.
 84. The system of claim 80, wherein state machine modesdefine protected priority levels associated with frames that are to beprotected.
 85. The system of claim 84, wherein priority levels that arehigher or equal to a current mode are protected.
 86. The system of claim84, wherein the state machine is configured to transit modes ifbandwidth or stream bit rate changes.
 87. The system of claim 80,wherein the state machine states determine a current window size that isassociated with a sending frame rate.
 88. The system of claim 87,wherein within each mode, state transits are configured to absorbshort-term bandwidth or stream bit-rate variations.
 89. The system ofclaim 80, wherein the state machine comprises: a first state associatedwith a state in which available bandwidth is not fluctuating; a secondstate in which the available bandwidth is insufficient; and a thirdstate in bandwidth recovers from the second state.
 90. The system ofclaim 89, wherein: the first state attempts to maintain a window sizeassociated with a sending frame rate; the second state decreases thewindow size and uses a client buffer for additional bandwidth; and thethird state increases the window size by actively dropping unprotectedframes.
 91. The system of claim 90, wherein the third state begins tofill a client buffer in anticipation of a bandwidth decrease.
 92. Amethod comprising: providing a state machine for implementing perceptualrate adaptation through mode and state transitions, wherein statemachine modes define protected priority levels associated with framesthat are to be protected, the state machine being configured to transitmodes if bandwidth or stream bit rate changes in the sense of long-term,the state machine comprising a first state associated with a state inwhich available bandwidth is not fluctuating and in which an attempt ismade to maintain a window size associated with a sending frame rate, asecond state in which the available bandwidth is insufficient and inwhich the window size is decreased and a client buffer is used foradditional bandwidth, and a third state in bandwidth recovers from thesecond state and in which the window size is increased by activelydropping unprotected frames; providing a virtual buffer modelcommunicatively coupled with and providing feedback to the state machinefor describing dynamical buffer filling and draining processes, thevirtual buffer model comprising an input for the sending frame rateassociated with the window size, an output associated with a constantframe consumption rate, and a feedback output provided to the statemachine and associated with a number of buffered frames after a giventime slot; and providing a bandwidth allocation module communicativelycoupled with the state machine for allocating bandwidth to frames giventhe state and mode of the state machine.